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Abstract — The fundamental task of a digital receiver is to 
decide the transmitted symbols in the best possible way, i.e., 
with respect to an appropriately defined performance metric. 
Examples of usual performance metrics are the probability of 
error and the Mean Square Error (MSE) of a symbol estimator. 
In a coherent receiver, the symbol decisions are made based on 
the use of a channel estimate. This paper focuses on examining 
the optimality of usual estimators such as the minimum variance 
unbiased (MVU) and the minimum mean square error (MMSE) 
estimators for these metrics and on proposing better estimators 
whenever it is necessary. For illustration purposes, this study 
is performed on a toy channel model, namely a single input 
single output (SISO) flat fading channel with additive white 
Gaussian noise (AWGN). In this way, this paper highlights the 
design dependencies of channel estimators on target performance 
metrics. 

Index Terms — Minimum mean square error (MMSE), mini- 
mum variance unbiased (MVU), probability of error, single input 
single output (SISO). 



I. Introduction 

SIGNAL estimation and detection are two main concerns 
in the course of designing a communication system [5], 
[13], [22]. The main goal is to design optimal demodulators 
at the receiver side providing the detector with the necessary 
sufficient statistics for its decision on the transmitted symbol at 
a specific observation interval. Furthermore, the optimization 
of the decision device is also a target, i.e., its design based 
on such statistical tests which rely on sufficient statistics and 
minimize the probability of error A different setup of optimal 
designs related to radar and sonar systems is to detect the 
presence of either a deterministic or random signal in noise 
with least probability of error or false alarm [17]. Although 
the two aforementioned setups have conceptual differences, 
they are usually treated in the same fashion. First, an optimal 
demodulator is necessary to deliver the sufficient statistics to 
the decision device. Then, the decision device, that optimally 
uses these sufficient statistics, has to be derived. The optimal 
design of the decision device is formulated in any case as 
a hypotheses testing problem. Moreover, the optimization of 
the transmitter is another related problem. In this case, the 
problem turns to be the design of optimal transmission sets, 
such that the end performance metric, i.e, the probability of 
error is minimized. 

Depending on the degree of knowledge about the transmis- 
sion channel at the receiver side, the detector can be coherent. 

The authors are with ACCESS Linnaeus Center, School of Electrical 
Engineering, KTH Royal Institute of Technology, SE 100-44, Stockholm, 
Sweden. E-mail: dimitrik@kth.se, cristian.rojas@ee.kth.se, hjalmars@kth.se, 
mats.bengtsson@ee.kth.se, mikael.skoglund@ee.kth.se. 



semi-coherent or noncoherent [22]. The more information 
about the transmission channel is available, the better the 
receiver's performance will be. This justifies the fact that 
the receivers usually have a built-in channel estimator In 
the communication and signal processing literature, the usual 
channel estimators are the minimum variance unbiased (MVU) 
and the minimum mean square error (MMSE) estimators [16]. 
The combination of these channel estimators with the optimal 
decision devices is usually considered to address the problem 
of determining the optimal receiver 

Current physical layer (PHY) standards that have attracted a 
lot of attention both from the mobile industry and the research 
community are the Wireless Interoperability for Microwave 
Access (WiMAX), the Long Term Evolution (LTE) and the 
Digital Video Broadcasting (DVB) either in its terrestrial 
(DVB-T) or its Handheld (DVB-H) versions [2], [7], [8], [9], 
[21], [24], [26]. These standards are orthogonal frequency 
division multiple access (OFDMA) based and they can sat- 
isfy the need for shorter communication links to provide 
truly broadband connectivity services. In these systems, either 
MVU/least squares (LS) or MMSE channel estimators are 
used, usually employing some sort of estimate interpolation 
through the frame if the goal is to track a time-varying channel 
[1], [4], [11], [18], [19], [20], [23], [25], [27]. 

In this paper, we re-examine the validity of the common 
belief that the MVU and MMSE channel estimators are the 
best choices to be combined with the optimal detectors, deliv- 
ering an overall optimal receiver, when finite-sample training 
is used to estimate the channefl. To this end, ideas originating 
from the system identification field are employed. Recent 
results in optimal experiment design indicate that it is better 
to design the optimal training for the estimation of a certain 
set of unknown parameters with respect to optimizing the end 
performance metric rather than the mean square error of the 
parameter estimator itself [3], [10], [14], [15]. We will slightly 
modify this idea and we will examine if the aforementioned 
channel estimators are the best choices, when the selection of 
the channel estimator is made with respect to an appropriately 
defined end performance metric. For illustration purposes, 
this study is performed on a toy channel model, namely a 
single input single output (SISO) flat fading channel with 
additive white Gaussian noise (awgnB The initial focus is 
on two different MSE criteria. These MSE criteria serve to 
demonstrate the dependence of the optimal channel estimators 

'in this sense, the asymptotic efficiency of the maximum likelihood (ML) 
estimator together with its invariance property ai'e irrelevant. 

-In this toy model, the MVU estimator coincides with the LS and the ML 
channel estimators. 
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on the end performance metrics. Their choice is based on 
the simpUcity of the analysis that they allow. Then, using 
the obtained results, we will examine the case of the error 
probability as the performance metric of interest. We show 
that for several performance metrics examined in this paper, 
the MVU and MMSE channel estimators are suboptimal, while 
we propose ways to obtain better channel estimators. Finally, 
we numerically compare the performances of the derived 
channel estimators with those of the MVU and MMSE channel 
estimators for all performance metrics in this paper These 
comparisons verify that the optimality of the usual channel 
estimators with respect to common end performance metrics 
is questionable. 

This paper is organized as follows: Section defines the 
problem of designing the channel estimator with respect to 
the end performance metric. Section |III] presents some results 
and comments that will be useful in the rest of the paper, 
while it introduces approximations of the performance metrics 
that the rest of the analysis will be based on. The optimality 
of the MVU and MMSE channel estimators with respect to 
the minimization of the symbol estimate MSE is examined 
in Section |IV] and subsections therein, while uniformly better 
channel estimators are also proposed. The same analysis as 
in Section |IV] is pursued in Section |V] for a differently 
defined symbol estimate MSE and in Section |Vl] for a rough 
approximation (variation) of the error probability performance 
metric. Section IVIII illustrates the validity of the derived 
results. Finally, Section IVIIII concludes the paper. 

II. Problem Statement 

The received signal model for a SISO system, when the 
channel is considered to be narrowband block fading, is given 
as follows: 

y{n) — hx{n) + w{n), (1) 

where y{n) is the observed signal at the receiver side at time 
instant n, h is the complex channel impulse response coeffi- 
cient, x{n) is the transmitted symbol at the same time instant 
taken from an M-ary constellation X = {xi, X2, ■ ■ ■ , xm} 
and w{n) is complex, circularly symmetric, Gaussian noise 
with zero mean and variance cr^. Given an equiprobable 
distribution on the constellation symbols, we further assume 
that E[x{n)] ~ and i?[|a;(n)p] = a^, while our modula- 
tion method is memoryless. In addition, w{n) and x{n) are 
independent random sequences, while w{n) is a white random 
sequence. 

Assume that a maximum energy £ and a training length of 
B time slots are available at the transmitter for training. We 
can collect the received samples corresponding to training in 
one vector: 



Vt 



(2) 



where y^. 



[yil-B + l),y{l-B + 2),--- ,y{l)Y is 



the vector of B received samples corresponding to 
training, x^_^ = [x{l - B + 1), x{l - B + 2), ■ ■ ■ , x{l)]'^ 
is the vector of B training symbols and lo^j. = 
[w{l - B + l),w{l - B + 2), ■ ■ ■ ,w{l)f is the vector of B 



noise samples. Considering the class of linear channel estima- 
tors, the channel is estimated as follows: 



(3) 



where / is a i? x 1 channel estimating filter. 

With the assumptions in ([TJ, if the constellation symbols 
are equiprobable and the channel is perfectly known, the ML 
detector is optimal [5], [22]. This is with respect to minimizing 
the probability that a different symbol from the one transmitted 
is decided given the transmitted symbol. The ML decision rule 
is given by the following expression: 

dec [x{n)] {h) = arg min \y{n) — hx{n)\'^. (4) 

Here, dec [x{n)] denotes the decision of the detector, when the 
transmitted symbol is x{n). In essence, the ML detector min- 
imizes the probability of error, when the transmitted symbols 
are equiprobable. When the receiver has a channel estimate h, 
h is replaced by h in the last expression. 

A different kind of performance metric is the MSE of 
a linear symbol estimator. In this paper, we will call the 
symbol estimator an equalizer. The equalizer uses the channel 
knowledge and delivers a soft decision of the transmitted 
symbol, i.e., a symbol estimate. We will call clairvoyant the 
equalizer that has perfect channel knowledge. Denoting this 
equalizer by c{h), we can find its mathematical expression as 
follows: 



cih) 



are' min E 

c{h) 



\c(h)y{n) - x{n)\' 



(5) 



where the expectation is taken over the statistics of x{n) and 
w(n). If we set the derivative of the last expression with 
respect to c{h) to zero and we solve for c{h), then the optimal 
clairvoyant equalizer is given by the expression 



\h\' 



(6) 



We will call this the MMSE clairvoyant equalizer We observe 
that as the SNR increases, i.e., trj 0, c{h) — > 1/h. We will 
call c{h) — 1/h the Zero Forcing (ZF) clairvoyant equalizer 
Using the above definitions and assuming that the receiver 
has only an estimate of the channel, the system performance 
metric is the symbol estimate MSE: 



MSE^ = E 



c{h)y{n) - x{n) 



(7) 



The MSE given by (|7]i can be defined in two different ways: 
If we assume that the channel is an unknown but otherwise 
deterministic quantity, then the expectation in (|7]i does not 
consider h. This leads to an MSE expression dependent on the 
unknown channel h. In this case, only the channel estimators 
that treat the channel as an unknown deterministic variable 
are meaningful. If we assume that the unknown channel is 
a random variable, then we can average the MSE expression 
over h. In this case, both the estimators that treat the channel as 
an unknown deterministic variable or as a random variable are 
meaningful. The former represents the case where the system 
designer chooses to ignore the knowledge of the channel 
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Statistics in the selection of the channel estimator for some 
reason. 

In the following, we focus on the ZF equalizer, which 
becomes optimal as the SNR increases. This choice is made 
to preserve the simplicity of this paper and to highlight the 
derived results. 

The previous MSB definition implies the definition of yet 
another MSB that is meaningful in the context of communica- 
tion systems. Given an equalizer, we can define the excess of 
the symbol estimate based on an equalizer that only knows 
a channel estimate over the equalizer with perfect channel 
knowledge, thus leading to 

MSE,e = ^; \c{h)y{n) - c{h)y{n)^ . (8) 

In the sequel, this metric will be called excess MSB. 

Our goal will be to determine the optimal channel estimators 
for fixed training sequences so that each performance metric 
based on a given equalizer is minimized. To this end, the 
following section presents some useful ideas. 

III. Preliminary Results 
Consider the MVU estimator Since it is an unbiased es- 
timator, it satisfies /^a^tr = 1- This condition implies that 
E[h] = h. For our problem assumptions, the MVU estimator 
can be found by solving the following optimization problem: 



mm CT„ 
/ 

s.t. f"x^ 



= 1. 



(9) 



Forming the Lagrangian for this problem and zeroing its 
gradient with respect to /, we get: 

•^MVU = ||a;J|2" ^^^^ 

For the sake of completeness, this estimator coincides with the 
ML and LS channel estimators under our assumptions. 

If we assume that the prior distribution of h is known, 
then instead of the MVU one could use the MMSB channel 
estimator. With our assumptions and the extra assumption that 
E[h] = 0, one can obtain [16] 

E[\h\^]x,, 

The MSEj: of the ZF equalizer using a deterministic channel 
("dc") assumption is 

2" 



MSEf (ZF) = E 



h^h 



h 



(tIE 



1 



(12) 



the corresponding for random channel ("rc") is: 



MSE;'^ (ZF) = Eh 



while for the MSE^jg we accordingly have: 

2' 
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\h\' 



MSE-^l (ZF) = E 



h-h 



\h\' 



(13) 



(14) 



(c.f. (O). The MSE^g is obtained by averaging the last 
expression over h. 

Depending on the probability distributions of and \h\, 
the above MSB expressions may fail to exist. The MSBs will 
be finite if the probability distribution function (pdf) of \h\ 
is of order 0(|/ip) as 0. A similar condition should 

hold for the pdf of \h\ in the case of MSE^g. In the opposite 
case, we end up with an infinite moment problem. In order 
to obtain well-behaved channel estimators that will be used 
in conjunction with the actual performance metrics, some 
sort of regularization is needed. Some ideas for appropriate 
regularization techniques to use may be obtained by modifying 
robust estimators (against heavy-tailed distributions), e.g., by 
trimming a standard estimator, if it gives a value very close to 
zero [12]. An example of such a trimmed estimator is given 
as follows: 



if 1/^,1 > A 



(15) 



where / can be any estimator and A a regularization parame- 
teJl 

Remark: Clearly, the reader may observe that the definition 
of the trimmed h preserves the continuity at \ f^y^^\ — A. 
Additionally, the event {f^y^^ = 0} has zero probability 
since the distribution of /^i/tr is continuous. Therefore, in 
this case h can be arbitrarily defined, e.g., h = X. 

We focus now on the MSEf (ZF). Assume a fixed A. In 
the appendix, we show that, for a sufficiently small A and a 
sufficiently high SNR during training, minimizing MSEf (ZF) 
is equivalent to minimizing the following approximation 



MSEf (ZF) 



E 


\h-h\^ 


E 


'\h\^' 





1 



E \h\^ 



(16) 



Following similar steps and using some minor additional 
technicalities, we can work with 



[MSE- (ZF)]o = 



Eh 


E 


\h-h\^ 




Eh 


E 


]h\^' 







1 



Eh 



E 



\h\' 



- (17) 

instead of MSE^^ (ZF). Moreover, MSE^^ (ZF) and 
[MSEll (ZF)](, can be defined accordingly. We will call the 
last approximations zeroth order symbol estimate MSBs and 
excess MSBs, respectively. The following analysis and results 
will be based on the zeroth order metrics and they will reveal 
the dependency of the channel estimator's selection on the 
considered (any) end performance metric. 
Remarks: 

1) A useful, alternative way to consider the zeroth order 
MSBs is to view them as affine versions of normalized 
channel MSBs, where the actual true channel is h and 
the estimator is h. 

2) In the definition of ( fT6b , one can observe that after 
approximating the mean value of the ratio by the ratio 

'This parameter can be tuned via cross-validation or any other technique, 
although in the simulation section we empirically select it for simplicity 
purposes. 



4 



of the mean values the infinite moment problem is 
eliminated. In the following, all zeroth order metrics 
will be defined based on the non-trimmed h to ease the 
derivations. This treatment is approximately valid when 
A is sufficiently small as it is actually shown in eq. (l42T i 
of the appendix. 

IV. Minimizing the zeroth order Symbol Estimate 

MSE 

We now examine the zeroth order symbol estimate MSE in 
the case of the ZF equalizer The optimality of the MVU and 
MMSE channel estimators will be investigated. Additionally, 
the training sequence is assumed fixed. 

A. ZF Equalization 

The channel is considered either deterministic or random, 
depending on the available knowledge of a priori channel 
statistics and the will of the system designer to ignore or to 
exploit this knowledge. 

1 ) Deterministic Channel: The expectation operators in Eq. 
(fTST i are with respect to w^^.,x{n) and w{n). We have: 

|2 . on," 



A possible / that satisfies this condition is 



odcZF 
J op 



opt 



which becomes: 



fdcZF _ I 1 
/opt — \ ^ 



MVU' 

(22) 
(23) 



for B ~ 1. Clearly, (|22] | is sufficient for ( fT9] l to become zero. 
However, (l22T l has another problem, namely that the optimal 
solution depends on the unknown channel h. 

In order to deal with the dependence of the optimal esti- 
mator on the unknown channel, we will resort to a stochastic 
approach. We will assume a noninformative prior distribution 
for the unknown channel. If the real and imaginary parts of 
the channel are considered bounded in the intervals 1]^ C 
R and Ij C K, El then the receiver can treat them as 
independent random variables uniformly distributed on Iji 



and I/, respectively. The 



<{[MSEf(ZF)] } 



MSEf (ZF) 



is now replaced by 



where Ej'^'^[ 



MSEf (ZF) 



- 1 



\h\'\f 



The numerator of the gradient of the above expression with 
respect to / discarding the outer cr^ is given by the following 
expressioqj 

[\h?w?+<yi\\fr] [\h\^v-irx,,+aif] 



[\h\^^*x,,+alf] 



+ \h\^\^-l\' + al\\ff 

(19) 



denotes the expectation 
with respect to the joint (uniform) distribution of the real 
and imaginary parts of h. Applying again the zeroth order 
'■^ ^Approximation and following the above analysi^ we can 
easily show that the eqs. (l2Tl i. (l22l l and (l23l l give again the 
necessary condition and optimal estimators in this case with 
the substitution of by E^^''-[\h\'^]. 

2) Random channel: In this case, the actual prior statistics 
of the channel are known. The zeroth order symbol estimate 
MSE is given by 



[MSE-(ZF)]o=a^ 



E\ 



H 



where ip 



Setting / 



MVU- 



we obtain: 



^0. 



E[m\f 



(20) 



Note that no choice of x^^ will zero this expression for 
any |/ip,cr^y. Therefore, the MVU is not an optimal channel 
estimator in this case. We can state this result more formally: 

Proposition 1: The MVU estimator is not an optimal chan- 
nel estimator for the task of minimizing MSE!^^ (ZF) , 
when the channel is considered deterministic but otherwise 
an unknown quantity. 

The question that arises in this case is how to find the 
optimal channel estimator in this setup or generally how to 
determine a uniformly better channel estimator for minimizing 
MSE^'' (ZF) . Equating (O to and taking the inner prod- 
uct of both sides with /, we obtain the following necessary 
condition that every optimal channel estimating filter / must 
satisfy given the training sequence x^^: 

„2 



1 



CT,?|/l|2 



(21) 



^Necessary (hermitian) transpositions take place, since checking the pos- 
sibility of zeroing the numerator by choosing / is not aifected by these 
operations. 



(24) 

Differentiating this expression with respect to / , we get the 
numerator of the gradient which is given by (fT9ll^l but with 
replaced by It can be easily shown that this numerator 

is different from zero if / = /mvu or / = /mmse- We 
therefore have a formal statement of this result: 

Proposition 2: The MVU and MMSE estimators are not 
optimal channel estimators for the task of minimizing 
[MSE^'^(ZF)]q, when the prior channel distribution is known. 
The optimal channel estimator /^pt ^ satisfies ( |2TI ), ( |22] | and 
~\, but with I ft. 1 2 replaced hy E[\hW 
Remarks: 

1) Considering (l22l l and the corresponding expression for 
the random channel case, we observe that the design 
of the estimator with respect to the end performance 
metric introduces a bias to the MVU estimator in the 
form of scaling, leading to a smaller value of the end 
performance metric than the one that we would obtain 
by using the MVU estimator This bias introduction 
mechanism has similarities with the introduction of bias 
in estimators to reduce their MSE (the MSE here is the 

'This assumption is usually reasonable in practice. 
^Part of this analysis is presented in Subsection IIV-A.2I 
'ignoring all the positive scaling terms 
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average square distance of the parameter estimator from 
the true value of the parameter) [6]. Nevertheless, the 
reader may observe the conceptual differences in the 
motivation and goals behind the end performance metric 
estimator designs presented in this paper and the ideas 
in [6]. 

2) The claimed optimality of the derived estimators in this 
section but also in this paper is with respect to the 
zeroth order performance metrics. These estimators turn 
out to be uniformly better than the MVU and MMSE 
estimators also when comparing against the true end 
performance metrics as we demonstrate in the simulation 
section. 

3) An alternative way to express eq. (l22t is 



odcZF 
J opt 

,2 //-^2|L|2 



1 + 



MVU' 



(25) 



where a = cr^/(a^\h\^) is the inverse SNR at the 
recceiver side. Depending on how we implement the last 
estimator in practice, a turns to a tuning parameter con- 
trolling the introduction of bias in the MVU estimator 
We numerically demonstrate this very interesting aspect 
of the derived estimators in Figs. |7]and|8] 



A. ZF Equalizer with a deterministic channel 
In this case, we have: 



MSEf, (ZF) 



\h\^\f"o 



- 1 



w 

(26) 



The numerator of the gradient of the above expression with 
respect tc0 / is given by the following expression: 



[\h\ 



[\h\\*x,, + crlf] \h\^\^-l\^ + crl\\f\\ 



(27) 



Setting / = /mvU' '-^^ easily check that the above 
expression becomes zero. Therefore: 

Proposition 3: The MVU is an optimal channel estimator 



for the task of minimizing 



MSEil (ZF) 



when the channel 



is considered a deterministic but otherwise unknown quantity. 



Remark: Note that even if 



MSE^^ (ZF) depends on the 
unknown channel h, the optimal channel estimator does not 
in this case. 



B. Discussion on the Optimal Training 

Since the channel estimator is selected in order to optimize 
the final performance metric of the communication system, one 
may consider the problem of selecting optimally the training 
vector x^.j. under a training energy constraint lla;^,. |p < £ 
to serve the same purpose. To optimize the training vector, 
one should first fix the channel estimator This is a "com- 
plementary" problem with respect to the approach that we 
have followed so far Suppose that we use either the MVU 
or the MMSE channel estimators. One can observe that for 
B = 1 the problem of selecting optimally the training vector 
is meaningless. Therefore, we will end up using an inferior 
channel estimator (i.e., the MVU or the MMSE) than the 
one given by (|23] | and its random channel counterpart. In 
the case that B > 1, fixing for example / = /mvu '^^^ 
can observe that again the problem of selecting optimally the 
training vector is meaningless. Consider for example the case 
of [MSE;'^ (ZF)]o. We then have: 



[MSE- (ZF)]o 



E[\h\' 



which only depends on ||a;j^||^. Furthermore, setting 6 = 
\\xt^\\^, it follows that d [MSEl" (ZF)]^ /dO < at sufficiently 
high SNR, i.e., [MSE^'^ (ZF)](, is minimized when \\xt^\\^ = 
£, which is intuitively appealing. Therefore, any x^^ with 
energy equal to £ is an equally good training vector for 
the MVU estimator Thus, for the same x^^, the estimator 
/ = /opf ^ will be better than the MVU. Similar conclusions 
can be reached for the MMSE estimator, as well. 

V. Minimizing the Zeroth Order Excess MSE 

We now examine the zeroth order excess MSE in the case 
of the ZF equalizer. 



B. ZF Equalizer with a random channel 

In this case, the prior statistics of the channel are known. 
The zeroth order excess MSE is given by: 



[MSE-(ZF)]„ = 



\^-l\'{E[\h\^]al + E[\h\^yj 
E[mip\^+al\\ffE[\h\^] 

, <yl\\ff{E[\h\']<yl + c7l) 

E[\hn^\^+al\\f\\'E[\h\^ 



-(28) 



Differentiating this expression w.rt. / and setting / — /mvu 
we zero the gradient. Therefore: 

Proposition 4: The MVU is an optimal channel estimator 
for the task of minimizing [MSE^g(ZF)]Q, when the channel 
is considered random. 

Via tedious calculations, we can show that the MMSE 
channel estimator does not zero the gradient. 

Remark: This result is counterintuitive: it says that when 
one has knowledge of the channel statistics but uses a ZF 
equalizer, one should ignore these statistics in choosing a 
channel estimator for minimizing the zeroth order excess MSE. 

VI. Minimizing the Zeroth Order Probability of 
Error for the ML detector 

It is straightforward to see that the decision rule given by 
^ is equivalent to: 



dec [x{n)] (h) — arg min 



y{n) 



x{n) 



(29) 



With a given channel estimate, h is replaced by h in the last 
expressior@. 

^discarding tlie positive scalars and considering again the con'esponding 
(hermitian) transpositions. 

'Notice that this does not generalize to ISI and/or MIMO channels. 
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In the case of a perfectly known channel, the division 
y{n)/h — x{n) + w{n)/h results in an AWGN channel 
with information bearing signal power cr^ and noise variance 
(T^/|/ip. If only an estimate of the channel, h — h + e, 
is available, then the division results in y{n)/h — x{n) + 
{w{n) — ex{n))/h. Here, e is the channel estimation error, 
which is Gaussian distributed according to our assumptions 
with E[e] = h(^f^x^^. — lj and vaiiance cr^ = cr,^||/||^. 

Also, E[\e\'] = \h\^\f"x,,^l\\al\\fr- 

For the case of most common constellations and an AWGN 
channel, the error probability is given by [22]: 



Pe ~ aQ (6VSNr) 



(30) 



where Q{x) = (l/\/27r) °^ e~*'/'^dt and a,b are positive 
constants depending on the geometry of the constellation. With 
a channel estimation error, the useful signal power is again 
cr^. The noise variable is now w(n)' = {w{n) — ex{n))/h 
and therefore E[w{ny] — 0. For the power of the noise 
component, we have: 



E[\w{ny\']^E 



w(n) 



E 



(31) 



Here, we face again the infinite moment problem. Using 
again similar arguments as in the appendix for approximating 
E[X/Y] by E[X]IE[Y] when Y = we define the 

corresponding zeroth order version of E [|w(n)'|2]: 



E 



w{n) 
h 



E\\h\- 



E 



[\h?\f' 



(32) 



E[\h?\ 



\h?\f"^J +'yl\\f\\ 



A variation of the error probability performance metric for 
any of the commonly used linear modulation schemes, named 
zeroth order error probability, will be given by the following 
expression: 



[Pe]o =aQ b 





'\hV\f"^.^'+al\\f\\\ 






'\h\^\f^.,^.-l\^ + al\\f\\\ 



-aQ h 



^ ^MSEf (ZF) 



(33) 



where we have used the zeroth order SNR approximation 
given by: 

[SNR]„ = 



MSEf (ZF) 

Clearly, ( l33T l is an artificial performance metric that appears 
in this paper for the sake of our arguments. It is used as 
a variation of the error probability to help us extract useful 
conclusions. 

Since Q{x) is a strictly decreasing function, the zeroth 
order probability of error for a given channel h is minimized 



when 



MSEf (ZF) 



is minimized. Therefore, the results of 



Subsection IIV-AI appiy: 



Proposition 5: The MVU estimator is not an optimal chan- 
nel estimator for the task of minimizing [Pelo given the true 
channel, when using any of the well-known digital modula- 
tions in a flat-fading AWGN channel. 

Suppose now that we average [Pe]o with respect to any 
given channel distribution. We can then make the following 
statement: 

Proposition 6: The MVU estimator is not an optimal chan- 
nel estimator for the task of minimizing the average [Pe]^, 
when using any of the well-known digital modulations in a 
flat-fading AWGN channel. 

Proof: Assume that the pdf of the fading coefficient 
magnitude is and \h\ £ [a,(3],a,/3 > 0, (3 possibly 

equal to +oo. The average [PcIq is given by the expression: 



/ aQlbJ[Sm,]pm)d\h\ 



(34) 



Assuming that the differentiation and integral operators can be 
interchanged, we can set the gradient of the above expression 



with respect to to zero to get the equation: 



VfH [Pelo - f aVfHQ [b^[WR])j p{\h\)d\h\ = Q" 



}NfH [SNR1„ 

f ^ ^M\h\)d\h\ 



= 



H 



where in the second equation we have used the chain rule of 
differentiation. Q{x) is strictly decreasing in x, thus 



< 



for any value of h. Also > for every value of h 

since it is a distribution function. Additionally, [SNR]q > 
for every value of h. Finally, the numerator of V [SNR]g for 
the MVU estimator is given by ( l20l l multiplied by — 1 and by 
a positive scalaj"! Therefore, it is either positive or negative 
with respect to /i in a componentwise fashion depending on the 
sign of the corresponding element in x^.^^ These arguments 
verify that V fH [Pelo 7^ 0. This concludes the proof. ■ 

If we assume that the prior distribution of h is known, 
then instead of the MVU, one could use the MMSE channel 
estimator. Plugging /mmse ii^^o the negative of ( fT9l ). one can 
obtain that V^h [SNR]^ | ^ 0. 

Since in the case of the estimator, the assumption 

is that we always know the prior channel fading distribution, 
we can make the following statement: 

Proposition 7: The MMSE estimator is not optimal for the 
task of minimizing [Pe]o, when using any of the well-known 
digital modulations in a flat-fading AWGN channel. 

Proof: The result follows along the same lines as in 
Proposition |6] ■ 

The problems of determining the optimal channel estimator 
for the task of minimizing [Pe]o for a given channel h and 
[Pe]o was already solved in Subsections IIV-A.TI and irV-A.2l 

'"The denominator is always positive as a squared term. 

' ' Some of the entries of tc^j. may be zero but not all of them simultaneously. 
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respectively. In the case of [Pe]o' we can only assess their 
optimality analytically, using the following argument: We 
use the upper bouncO Q{x) < (l/a;)(l/V27r)e~^^/^, a; > 
0, which becomes tight as x increases [22]. In our case, 
X — 6-^[SNR]g, and we have already assumed high SNR, 
therefore high [SNR]q, to justify the use of the ZF equalizer. 
Using this bound and the relationship between [SNR]q and 



MSEf (ZF) 



we get: 



g(6J[SNR]o) < 



MSEf (ZF) 



1 



bax 



< 



MSEf (ZF) 



/27r 
1 



.g 2[MSEd<^(ZF)]^ 



/27r 



(35) 



where the last inequality holds for large SNR and therefore 

small MSEf (ZF) . The right hand side function IS concave 

L ^ ' 'Jo 

with respect to MSEjJ'^(ZF) > 0, therefore averaging over 
any channel distribution, we get: 



E, 



Q ^/[SNR], 



< 



Eh 



MSEf (ZF) 



bax V 27r 

We can use one more time the zeroth order approximation to 



approximate Eh 



MSEf (ZF) 



. The right hand side is min- 



imized when this last zeroth order approximation is minimized. 
Thus, the estimators derived in Subsections lIV-A.TI and llV-A. 21 
are optimal for the task of minimizing [Pelg, in the sense that 
they minimize an upper bound to Eh [Q (6^[SNR]p)] . 

Remark: Although, we have shown that the MVU and 
MMSE estimators are not optimal for the task of minimizing 
the zeroth order probability of error, we will see in the simula- 
tion section that their actual probability of error performance 
is almost identical with that of the optimal estimators for the 
zeroth order probability of error This is due to two facts: first, 
the zeroth order probability of error is a variation of the ac tual 
probability of error and second, in practice the difference in 
the channel estimates must be large enough to give rise to 
a notable difference in the probability of error Nevertheless, 
we conjecture that such a difference may be more clear in 
the case of multiple input multiple output (MIMO) systems 
if tight approximations of the error probability functions are 
used to derive the corresponding channel estimators. 

VII. Simulations 

In this section we present numerical results to verify our 
analysis. In all figures, h ^ CAf{Q, 1) and QPSK modulation 
is assumed. The SNR during training highlights how good 
the channel estimate is. The parameter A has been empirically 
selected to be 0.1. All schemes in Figs. |4][8]use ( fTsT i for the 
same A. In Figs. gHSl Ej^'^llhl'^] is chosen to be 3E[\h\'^] = 
3, i.e., the real and imaginary parts of h are assumed i.i.d. 
following a uniform distribution in [—3/^/2,3/^/2]. In Figs. 
□ andU E^^'^[\h\'^] equals 1/2 and 1/6, respectively. 



SNR , , =0 dB, B=5 

training 



(22) with |hr replaced by E[|tir] (uniform pdt) 

■ lulMSE 

■ (22) witti |ti|' replaced by E[|ti|''] (true pdt) 



N 

cn 




SNR (dB) 

Fig. 1. [MSEJ'=(ZF)]o with SNR during taining equal to dB and B = 5. 



SNR. , , =0 dB, B=2 

training 



(22) witti Ihl'' replaced by E[|ti|T (unitorm pdt) 

■ lulMSE 

■ (22) with |h|' replaced by E[|ti|'] (true pdt) 




SNR (dB) 



Fig. 2. [MSEJ^(ZF)]o with SNR during training equal to dB and B = 2. 



SNR. , , =10 dB, B=5 

training 



(22) witti |tip replaced by E[ltir] (unitorm pdt) 

■ lulHflSE 

■ (22) witti |ti|' replaced by E[|ti|^] (true pdt) 




10 15 20 



SNR (dB) 



Fig. 3. Average Q (y'H [MSEJ'=(ZF)]o j with SNR during training equal 
to 10 dB and _B = 5. 



'^The usual Chemolf bound can also be used. 
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SNR . . =0 dB, B=5, X=0.1 
training 



SNR . . =0 dB, B=5, X=0.1 
training 



LU 




(22) witti |hr replaced by E[|tir] (unilorm pdf) 

■ MMSE 

■ (22) with Ihl'^ replaced by E[|ti|^] (true pdt) 



N 

cn 




5 10 

SNR (dB) 



Fig. 4. MSEJ=(ZF) with SNR during training equal to dB, B = 5 and 



A = 0.1. Moreover, E^'^[\h\'^] = 3. 



SNR (dB) 



Fig. 7. MSEj;'^(ZF) with SNR during training equal to dB, B = 5 and 
A = 0.1. Moreover, Ej'^''[\h\^] = 1/2. 



SNR . . =OdB, B=2, X=0.1 
training 



(22) with |hr replaced by E[|hr] (unltorm pdf) 

■ MMSE 

■ (22) with |h|^ replaced by E[|hp] (true pdt) 



SNR. , . =0 dB, B=2, A,=0.1 
training 



LU 




LU 




SNR (dB) 



Fig. 8. MSEJg(ZF) with SNR during training equal to dB, B = 2 and 
A = 0.1. Moreover, E'^'^[\h\'^] = 1/6. 



SNR (dB) 



Fig. 5. MSEJ^(ZF) with SNR during training equal to dB, B = 2 and 
A = 0.1. Moreover, E'^'^[\h\'^] = 3. 




20 



SNR (dB) 



Fig. 6. Average Pe with SNR during training equal to 10 dB, B = 5 and 
A = 0.1. Moreover, E^'^[\h\^] = 3. 



In Fig. [H [MSE;'=(ZF)]o is presented for B = 5 and SNR 
during training equal to dB. The derived optimal estimators 
in this paper are better than the MVU and MMSE estimators. 
Additionally, the MVU estimator appears to be better than the 
MMSE estimator for this performance metric. This is a new 
observation contradicting what one would expect and verifying 
the motivation of this paper. 

Fig. |2]presents the corresponding results for [MSE^g(ZF)]p. 
The MVU is the best estimator as proved. This is another 
example contradicting what one would expect and verifying 
the motivation of this paper 

Furthermore, Fig. |3] shows the performance of all schemes 
in the case of an approximation to the error probability equal 
to Q {^al/ [MSEI''{ZF)]q). Here, we have assumed that the 
constants a, 6 are equal to 1, since their specific values are 
irrelevant to the purpose of this simulation plot. The derived 
estimators in this paper are better than the MVU and MMSE 
estimators as proved in the previous section. The difference of 
the curves is present in the low SNR regime. 

We now examine the performance of the derived estimators 
in this paper for the true performance metrics. All the esti- 
mators are implemented based on (fT5|) to combat the infinite 
moment problems. 
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In Fig. m MSE;'=(ZF) is presented for B = 5 and SNR 
during training equal to dB. The derived optimaf^ estimators 
in this paper are better than the MVU and MMSE estimators. 
We can see that the zeroth order approximations in this case 
are satisfactory even for a low SNR during training, in the 
sense that the corresponding optimal estimators outperform the 
MVU and MMSE estimators for the true performance metric. 
Additionally, the MVU estimator appears to be better than the 
MMSE estimator for this performance metric. This is yet a 
new snapshot contradicting what one would naturally expect. 

Fig. |5] presents the corresponding results for MSE^g(ZF). 
The MVU is better than the MMSE estimator, coinciding with 
the analysis based on the zeroth order approximation. Note 
however that the other two estimators appear to be better than 
the MVU. To obtain a well-behaved MSE;^g(ZF) in this case, 
regularization of the same form as in ((15]) is applied to h to 
avoid values around zero. In this sense. Fig. |5] serves more 
as a proof that the application-oriented estimator selection is 
valid and less as an actual scenario present in the real world. 

Furthermore, Fig. |6] shows the performance of all schemes 
in the case of the error probability performance metric. Monte 
Carlo simulations have been used to compute the actual error 
probability. All schemes coincide because the differences in 
the channel estimates are not so large to appear in the error 
probability. Nevertheless, these differences may clearly appear 
in a MIMO scenario if tight approximations of the error 
probability function are used to derive the corresponding 
channel estimators. 

Finally, Figs.|2]and[8]demonstrate the validity of the Remark 
3 in the end of subsection IIV-AI These plots correspond to 
Figs. Eland IS but with E'j:;'^[\h\'^] = 1/2 and 1/6, respectively. 
They verify that the zeroth order metrics used in this paper 
are good approximations in terms of indicating the structure 
of uniformly better estimators than the MVU and MMSE. 
Nevertheless, the zeroth order metrics cannot really determine 
the best possible bias with respect to the MVU estimator that 
the estimators in this paper must have in order to yield the best 
possible performance against the true performance metrics. 
The bias terms are only optimal with respect to the zeroth 
order metrics. 

VIII. Conclusions 

In this paper, application-oriented channel estimator selec- 
tion has been compared with common channel estimators such 
as the MVU and MMSE estimators. We have shown that 
the application-oriented selection is the right way to choose 
estimators in practice. We have verified this observation based 
on three different performance metrics of interest, namely, the 
symbol estimate MSE, the excess symbol estimate MSE and 
the error probability. 



Gaussianity of y^^, MSE^''(ZF) = oo for any / 7^ (infinite 
moment problem). Using (flSl l. the corresponding mean square 
error becomes: 



MSEf (ZF) 









2 


E 




1 


+ 



Pr{\f"yJ>x}- 

\f"ytr\ > A 



\f"ytr 



E 



A2 



rH 

f Vti 

\f"yt.\ 



A2 



, (36) 



where ; denotes conditioning and "reg" signifies the use of 
the regularized channel estimator in (fTTt . To simplify this 
expression, we observe that Pr||/'^yj,.| < a| = 0{X^), 
since by the mean value theorem this probability is equal to the 
area of the region 

{\f"ytr\ < A}, which is of order 0{\^), 
multiplied by some value of the probability density function 
of l/^i/ti I in that region, which is of order 0(1). In addition. 



E 



A2 



\f"ytr\ 



2 2 

A2 ^ A 



f"y. 
\f"ytr\ 



If in addition the SNR during training is sufficiently high and 
the probability mass of \f^yt-^.\ is concentrated around \h\, 
then it can be shown that 



E 





h 




J 2/tr 



l;l/''ytrl>A 



alE[\f"y,,-h\^,\f"y,,\> 



E[\f"y,,\^-\f"y,,\>X] 



(37) 



The same holds even if f^y^^ is a biased estimator of h at 
high training SNR and \ f^y^^^\ tends to concentrate around a 
value a bounded away from \h\ (and of course from 0). 

To show the last claim, we set X = j/^y^j. — Zip and 
Y = l/^ytrP- Since Y > A^, it also holds that E [Y] > A^. 
Furthermore, it can be seen that 



E 



E[X] 



E\Y] 



<^E[\XE[Y]-YE[X]\]. (38) 



At high training SNR, X E[X] and Y ^ E[Y] in the 
mean square sense and therefore it can be easily shown that 
the right hand side of dSSl l converges to 0. To see this, notice 
that the Cauchy-Schwarz inequality yields 



Appendix 

This section proposes a simpHfication of the MSEf (ZF) 
metric for the estimator given in (ITSt with a fixed A. Due to the 

'^The term "optimal" is used in this case to refer to uniformly better 
estimators than the MVU/MMSE estimators and not to actually optimal 
estimators in the strict sense. The estimators are optimal only with respect to 
the zeroth order metrics. 



^E [\XE[Y] - YE[X]\] <^{e [\XE[Y] - YE[X] 
= ^ {E'^[Y]E[X^] + E[Y^]E^[X] - 2E[XY]E[X]E[Y]) 



1/2 

1/2 
(39) 



Since X E[X] and Y E[Y] in the mean square 
sense, E[X^] E^[X], E[Y^] E^[Y] and E[XY] 
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For the last case, notice that 

\E[XY] - E[X]E[Y]\ = \E [{X E[X]){Y E[Y])]\ 
<E[\X-E[X]\\Y-E[Y]\] 



< 



E 



\X-E[X]\' 



E 



\Y-E[Y]\' 
(40) 



where the last inequality follows again from the Cauchy- 
Schwarz inequality. By the mean square convergence of X 
to E[X] and Y to E[Y] the right hand side of (gOll tends to 
0. Therefore, the right hand side of (|39) tends to 0. 

Furthermore, under the high SNR assumption the condi- 
tional expectations can be approximated by their unconditional 
ones, since for a sufficiently small A their difference is due to 
an event of probability O(A^). Therefore, 



E 



1 



h 



J Vtv 



\f Vtr 



\f"ytr-\ 



> A 



alE[\f"y,^-h\']+al 



E[\f 



0(A2 



Combining aU the above results yields 

'alE[\f''y,,-h\'] 



MSEf (ZF) 



E[\f"ytr\'] 



(41) 



0(1). (42) 



The 0(1) term is not negligible but for sufficiently small A 
its dependence on / is insignificant. Hence, for a sufficiently 
small A and a sufficiently high SNR during training, mini- 
mizing MSE!^'^(ZF) is equivalent to minimizing the following 
approximation 



MSEf (ZF 



E 


'\h-h\^ 


E 


'\h\^' 





a„ + a,,, 



1 



E 



(43) 
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